Introduction

The goal of this assignment is to dig further in factor and plot management. This assignment includes four parts.

For this assignment, I will be using the gapminder dataframe.

suppressPackageStartupMessages(library(tidyverse))
suppressPackageStartupMessages(library(plotly))
library(knitr)
library(gapminder)

Part 1: Factor management

For this part, I choose the varibles country and continent. The first thing to do is to ensure the varibles I’m exploring are indeed facrots.

Are country and continent factors?

is.factor(gapminder$continent)
## [1] TRUE
is.factor(gapminder$country)
## [1] TRUE

Look at the levels of continent

fct_count(gapminder$continent) %>% 
kable()
f n
Africa 624
Americas 300
Asia 396
Europe 360
Oceania 24

Drop Oceania -level and factor

Let’s drop all the rows of Oceania, the new data frame is named as gap_drop_oceania.

gap_drop_oceania <- gapminder %>%
    filter(continent != "Oceania")
fct_count(gap_drop_oceania$continent) %>% 
kable()
f n
Africa 624
Americas 300
Asia 396
Europe 360
Oceania 0
nrow(gapminder)
## [1] 1704
nrow(gap_drop_oceania)#check the number of rows before and after change
## [1] 1680

From the table above, we can know that all the rows of Oceania have been removed, but the factor still remains. The original gapminder data frame has 1704 rows and 5 factors. The new gap_drop_oceania data frame has 1680 rows and 5 factors.

Now, let’s drop the unused factor, the new data frame is named as gap_no_oceania.

gap_no_oceania <- gap_drop_oceania %>% 
  droplevels()
fct_count(gap_no_oceania$continent) %>% 
kable()
f n
Africa 624
Americas 300
Asia 396
Europe 360
nrow(gap_no_oceania)#check the number of rows after change
## [1] 1680

It looks like that Oceania has been removed as an unused factor. The new gap_no_oceania data frame has 1680 rows and 4 factors.

Reorder the levels of country and continent

This task is to use the forcats package to change the order of the factor levels, based on a principled summary of one of the quantitative variables. I will reorder the columns of gap_no_oceania based on the maximum value of the pop. I plot the data frame in order to show the order better,

gap_no_oceania %>%
    mutate(continent = fct_reorder(continent, pop, max)) %>%
    ggplot(aes(continent, pop, fill = continent)) +
    scale_y_log10()+
    geom_violin() +
    labs(title = "population each continent")+
    theme(plot.title = element_text(hjust = 0.5)) #center the title

Now, let’s explore the effects of arrange() individually.

gap_no_oceania %>%
    group_by(continent, pop) %>%
    summarise(max_pop = max(pop)) %>%
    arrange(desc(max_pop)) %>%
    ggplot(aes(continent,pop, fill = continent)) +
    scale_y_log10()+
    geom_violin() +
    labs(title = "population each continent")+
    theme(plot.title = element_text(hjust = 0.5)) #center the title

Comparing these two plots above, we can know that:

  • arrange() only has effects on rows, not on factors, so the order of continent does NOT change in the second plot.

  • factor reordering coupled with arrange() does have effects on the order of the factor, so continent is in the order of the increase of the maximum value of pop.

Part 2: File I/O

Make a new data frame

To begin with this part, I will first make a small and reasonable data frame called gap_Europe. This data frame includes all the lifeExp of each European countries after 1972. We can see see that it is in the order of alphabet.

gap_Europe <- gapminder %>%
    filter(continent == "Europe"& year >= 1972) %>% 
    select(country,lifeExp)
gap_Europe %>%     
head() # just show the first few lines of the long table 

To see it clearly, I put it into a graph.

gap_Europe %>%
  ggplot(aes(country, lifeExp)) +
  geom_point() +
  labs(title = "LifExp in Europe after 1972") +
  theme(axis.text.x = element_text(angle = 90))+
  # use the theme function to rotate x axis to avoid overlapping
  theme(plot.title = element_text(hjust = 0.5)) #center the title

Make the data frame non-alphabetical

Now let’s make it non-alphabetical.The new data frame is called gap_Europe_reorder. I make it in the order of the increase of the minimum value of lifeExp and plot the graph below which looks more reasonable.

gap_Europe_reorder<- gap_Europe %>% 
  mutate(country = fct_reorder(country, lifeExp, min))
gap_Europe_reorder %>% 
  ggplot(aes(country, lifeExp)) +
  geom_point() +
  labs(title = "Life expectancy in Europe after 1972") +
  theme(axis.text.x = element_text(angle = 90))+
  # use the theme function to rotate x axis to avoid overlapping
  theme(plot.title = element_text(hjust = 0.5)) #center the title

Experiment with write_csv()/read_csv()

write_csv(gap_Europe_reorder,"gap_Europe_reorder.csv")
read_csv("gap_Europe_reorder.csv") %>% 
  ggplot(aes(country, lifeExp)) +
  geom_point() +
  labs(title = "Life expectancy in Europe after 1972") +
  theme(axis.text.x = element_text(angle = 90))+
  # use the theme function to rotate x axis to avoid overlapping
  theme(plot.title = element_text(hjust = 0.5)) #center the title
## Parsed with column specification:
## cols(
##   country = col_character(),
##   lifeExp = col_double()
## )

It’s abvious that the order does NOT survive - it is alphabetical again now.

Experiment with saveRDS()/readRDS()

saveRDS(gap_Europe_reorder,"gap_Europe_reorder.rds")
readRDS("gap_Europe_reorder.rds") %>% 
  ggplot(aes(country, lifeExp)) +
  geom_point() +
  labs(title = "Life expectancy in Europe after 1972") +
  theme(axis.text.x = element_text(angle = 90))+
  # use the theme function to rotate x axis to avoid overlapping
  theme(plot.title = element_text(hjust = 0.5)) #center the title

Yes, the order keeps the same through the round trip of saveRDS()/readRDS().

Experiment with dput()/dget()

dput(gap_Europe_reorder,"gap_Europe_reorder.dput")
dget("gap_Europe_reorder.dput") %>% 
  ggplot(aes(country, lifeExp)) +
  geom_point() +
  labs(title = "Life expectancy in Europe after 1972") +
  theme(axis.text.x = element_text(angle = 90))+
  # use the theme function to rotate x axis to avoid overlapping
  theme(plot.title = element_text(hjust = 0.5)) #center the title

Yes, the order also keeps the same through the round trip of dput()/dget().

Part 3: Visualization design

Remark a figure

In this part, the task is to remark a new figure, in light of something I learned in the recent class meetings about visualization design and color. I am going to replot the first figure I plotted in assignment 2. In this figure, I meaned to show the relationship between lifeExp and gdpPercap, but the plot looks squished at the bottom.

ggplot(gapminder,aes(x=lifeExp, y=(gdpPercap)))+
  geom_point(color='steelblue',size=1)

Now, let’s remark this figure and call it new_gap.

new_gap <-gapminder %>% 
  group_by(continent,year) %>% 
  ggplot(aes(y=lifeExp, x=gdpPercap,color = lifeExp))+
  facet_wrap(~continent)+
  scale_x_log10()+
  geom_point(size=1)+
  labs(title = "Relationship between lifeExp and gdpPercap")+
  theme_bw()+
  theme(plot.title = element_text(hjust = 0.5),
        axis.text  = element_text(size=12),
        strip.background = element_rect(fill = "orange"))
new_gap

Now, the figure looks more reasonable based on the following aspects:

    1. by scale_x_log10(), the plots do NOT look squished any more.
    1. by facet_wrap(), the original figure is faceted into 5 figures by continents, so the relationship can be seen more clearly for each continent
    1. by color=, the new figure shows the value of lifeExp intuitivly.
    1. the title is added and centered which completes the figure.

Convert it into a plotly graph (Results only applicable in .html)

new_gap %>% 
  ggplotly()

Convert it into a 3D plot (Results only applicable in .html)

Now, take Asia for an example, let’s add year to form a z-axis for a 3D plot.

gapminder %>% 
   filter(continent == "Asia") %>% 
   plot_ly(
        x = ~gdpPercap, 
        y = ~lifeExp, 
        z = ~year,
        color = ~country, # color by country
        type = "scatter3d",
        mode = "markers",
        opacity = 0.5) %>% 
   layout(xaxis = list(type = "log"), 
       yaxis = list(type = "log" ))  # log x and y
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors

## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors

Part 4: Writing figures to file

new_gap # choose new_gap to save

ggsave("new_gap_file.png",new_gap, width=15, height=10, units = "cm")
load Graph

load Graph